The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

نویسندگان

  • Vivek S. Borkar
  • Sean P. Meyn
چکیده

It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stability and Convergence of Stochastic Approximation using the O.D.E. Method - Decision and Control, 1998. Proceedings of the 37th IEEE Conference on

It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated 0.d.e. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) ...

متن کامل

Learning Algorithms for Risk-Sensitive Control

This is a survey of some reinforcement learning algorithms for risk-sensitive control on infinite horizon. Basics of the risk-sensitive control problem are recalled, notably the corresponding dynamic programming equation and the value and policy iteration methods for its solution. Basics of stochastic approximation algorithms are also sketched, in particular the ‘o.d.e.’ approach for its stabil...

متن کامل

Lids - P - 2172 Asynchronous Stochastic Approximation and Q - Learning 1

We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, a reinforcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available.

متن کامل

APPROXIMATION OF STOCHASTIC PARABOLIC DIFFERENTIAL EQUATIONS WITH TWO DIFFERENT FINITE DIFFERENCE SCHEMES

We focus on the use of two stable and accurate explicit finite difference schemes in order to approximate the solution of stochastic partial differential equations of It¨o type, in particular, parabolic equations. The main properties of these deterministic difference methods, i.e., convergence, consistency, and stability, are separately developed for the stochastic cases.

متن کامل

Approximation of stochastic advection diffusion equations with finite difference scheme

In this paper, a high-order and conditionally stable stochastic difference scheme is proposed for the numerical solution of $rm Ithat{o}$ stochastic advection diffusion equation with one dimensional white noise process. We applied a finite difference approximation of fourth-order for discretizing space spatial derivative of this equation. The main properties of deterministic difference schemes,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • SIAM J. Control and Optimization

دوره 38  شماره 

صفحات  -

تاریخ انتشار 2000